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(54) Title: IMPROVED TURBO DECODER 1 



(57) Abstract 

In a radio telecommunications system op- 
erating an iterative version of the MAP algo- 
rithm, a decoder for a data source or data receiver 
comprising three memories (80, 82, 84) arranged 
to perform in parallel the steps of updating the 
NDP alpha metrics in a forward direction; updat- 
ing the NDP beta metrics in a backward direc- 
tion, and computing NDP values of the output 
probability distribution. The arrangement can be 
regarded as a split RAM S1SO architecture. 
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IMPROVED TURBO DECODER 1 

5 

This invention relates to a decoder, that is a decoder used in a radio 
telecommunications system in which digital data has been coded by turbo code to 
improve performance, and relates especially to a de-interleaver in the decoder. 

One type of code which is applied in radio telecommunications systems is a 

10 concatenated convolutional code ("turbo code") which implements the iterative version 
of the well known Maximum A Posteriori (MAP) algorithm.. The core of the MAP 
decoding algorithm is a procedure to derive the sequence of probability distributions 
over the information symbols alphabet based on the received signal and constrained on 
the code structure. The MAP algorithm is described by L R Bahl, J Cocke, F Jelinek 

15 and J Raviv: "Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate". 
IEEE Transactions on Information Theory, pp. 284-287, March 1974. Efficient 
implementation of the algorithm in an integrated circuit is not easy, and in some 
versions the algorithm is modified. 

In one version, logarithmic approximation is used, which renders the algorithm 

20 additive by replacing all metrics with logarithmic qualities, as described by S Benedetto, 
D Divsalar, G Montorsi and F Pollara, "Soft-output decoding algorithms for continuous 
decoding of parallel concatenated convolutional codes", Proceedings of ICC'96, Dallas, 
Texas, June 1996 and S Benedetto, D Divsalar, G Montorsi and F Pollara, "Soft-input 
soft-output modules for the construction and distributed iterative decoding of code 

25 networks", European Transaction on Telecommunications, vol. ETT 9, March/ April 
1998. 

In another version, also described in the two references just given, a sliding 
window version of the algorithm is provided, where the decoder operates on a fixed 
memory length, instead of requiring that the whole transmitted sequence is stored. 
30 Even in these simplified versions, the decoding algorithm remains critical for 

high rate applications. The requirement of performing a substantial number of decoding 
iterations while supporting a high data rate can pose serious implementation problems, 
especially for an implementation in low cost ASICs. 

Conventionally, a complete turbo decoder usually includes two kinds of blocks; 
35 the first kind comprises Soft Input Soft Output (SISO) stages which implement the 
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MAP algorithm, and the second kind comprises de-interleavers, which scramble (de- 
scramble 9 ) the processed data according to the interleaving laws used in the encoder. 
Other kinds of blocks may also be required for the implementation of the decoder, such 
as RAM memories for storing data through the iterations or synchronization circuits. 
5 These blocks can be inter-connected in the decoder in many topologies, as described in 
the paper in ETT Volume 9 given above. Two alternative decoding strategies can be 
adopted: if a single instance for each SISO stage is allocated, the required iterations are 
performed serially (serial decoding) and the overall decoding speed is Nit times slower 
than the speed of the SISO where Nit is the number of iterations. As an alternative, a 
10 parallel decoding can be adopted, where multiple SISO stages are allocated for each 
iteration and the global decoding speed is equal to that of the SISO stage. In this case, 
a feed-forward is used, consisting of a chain of blocks that process the data in 
pipelining. 

An important role is played in the decoder by the interleaver and the de- 
15 interleaver, which usually add a large contribution to the whole cost, mainly for three 
reasons: 

1 the error rate performance of the code is inversely proportional to the 
interleaving length, thus the interleaver implementation involves a large RAM; 

2 the size of the memory devoted to the storage of the reliabilities 
20 computed by the de-mapper is also related to Nint, the Number of Interleavings; 

3 it has been proved that, at least for high signal to noise ratios, the best 
performances are obtained with randomly generated interleaving patterns: this means 
that the required sequence of addresses cannot be obtained through simple 
computations but quite large ROMs must be allocated. 

25 It is an object of the invention to reduce the complexity of a decoder circuit. 

According to the invention, a method of decoding concatenated convolutional 
codes comprising the steps of storing a block of N.DPinput branch probabilities in a 
memory; from the stored block: 

updating NDP alpha metrics in a forward direction; 
30 updating NDP metrics in a backward direction; and 

computing NDP values of the output probability distributions; 
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characterised in that the two updating steps and the computing step, are 
performed in parallel. NDP is the length or width of the sliding window. 

Also according to the invention, a radio telecommunications system in which 
each data source and data receiver comprises a digital coder/decoder for applying 
5 concatenated convolutional codes, characterised by a decoder comprising three 
memories each having a single read-modify-write access, the memories being arranged 
to perform in parallel the steps of updating NDP alpha metrics in a forward direction; 
updating NDP beta metrics in a backward direction; and computing NDP values of the 
output probability distributions. 
1 o The code may be an iterative version of the Maximum A Posteriori Algorithm. 

The prior art will be described with reference to Figures 1, 2, 3 and 4a of the 
accompanying drawings in which:- 

Figure 1 illustrates schematically a part of a radio telecommunications system; 

Figure 2 illustrates schematically a part of the encoding/decoding and 
1 5 modulating/demodulating part of the radio telecommunications system; 

Figure 3a illustrates schematically a more detailed view of a convolutional 
encoder and 3b is a more detailed view of a convolutional decoder. 

The invention will be described with reference to Figures 4, 5 and 6 in which > 

Figure 4 illustrates the timing of memory operations' by the use of three RAMs; 
20 Figure 5 illustrates the decoder architecture incorporating three RAMs; and 

Figure 6 illustrates a modified version of Figure 5. 

In Figure 1, a radio telecommunication system comprises a Core Network (CN) 
10 having an interface 12 with a Radio Access Network (RAN) 14 which in turn has an 
interface 16 with a plurality of mobile users 18, 20. In the RAN 14 are two Base 

25 Station Controllers (BSC) 22, 24 each controlling two Base Transceiver Stations 
(BTS) 26, 28. 30, 32. In practice there will be many BSCs controlling many BTSs. 

Referring now to Figure 2, in a transmitter 40 a data source 42 supplies data to 
an encoder 44 which applies the MAP algorithm to the data; the encoder output is 
connected to a modulator 46 which modulates the encoder data signal, and supplies it 

30 to a communications channel 48. In a receiver 50, a demodulator 52 receives a signal 
from the channel 48, demodulates it and passes the signal to a decoder 52. The 
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decoder 52 decodes the data signal and supplies it to user equipment 54, such as one of 
the mobiles 18, 20. 

While traversing the channel 48 the signal is subject to noise N, to fading F, and 
to frequency offset O; to ameliorate these deleterious effects, coding such as turbo 
5 coding is used. 

Figure 3a illustrates an encoder 44. An incoming steam of bits N passes to a 
first buffer 60 where the stream is divided and passes to a first convolutional encoder 
62 and also through an interleaver 64 and a second buffer 66 to a second convolutional 
encoder 68. The first encoder operates at rate VS and provides two encoded output 
10 signals XI and X2. The second encoder operates at rate 1 and provides a third output 
signal X3. 

Figure 3b illustrates a decoder 70, comprising a first decoder core element 72 
operating on SISO principles, an interleaver 74, a second decoder core element 76 also 
operating on SISO principles, and a de-interleaver 78. The signal XI is supplied to the 

15 first decoder element 72 and the signal X2 is supplied to the second decoder element 
76. The third signal X3 is supplied to both decoders. The output of the de-interleaver 
is a decoded data signal. 

Each mobile 18, 20 in Figure 1 and each BTS 26, 28, 30, 32 is provided with an 
encoder and a modulator, and also with a decoder and a de-modulator. 

20 Reference has been made above to the use of the sliding window version of the 

MAP algorithm, in which the decoder operates on a fixed memory length. An 
improvement proposed by A J Viterbi, "An intuitive justification of the MAP decoder 
for convolutional codes", IEEE Journal on Selected Areas in Communications, vol. 16, 
. no. 2, Feb. 1998 as an alternative to the storage of the entire stage metric history. 

25 The basic idea of this solution is to double the extension of the sliding window 

from N... D... P... (NDP) to 2 NDP: in this way, after the initial NDP steps of the 
backward recursion required by the algorithm, the computed beta values have a 
memory longer than NDP (in the sense that they include the contribution of more than 
NDP branch metrics through the trellis) and can therefore be used to directly to feed an 

30 output. 

To describe the proposed SISO architecture, let us define as S the operation of 
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storing a block of NDP input branch probabilities in a RAM; according to the MAP 
algorithm, this block of written values must be read three times for performing the 
following operations: 

1 Operation A, which is the updating of NDP alpha metrics in the forward 

5 direction; 

2 Operation B, which is the updating of NDP beta metrics in the backward 
direction; 

3 Operation P. which is the computing of NDP values of the output 
probability distribution. 

10 In the inventive arrangement, these three operations are performed in parallel. 

In order to avoid the use of a multiple port RAM, three separated memories of depth 
NDP are used so that the defined operations operate on them in a cyclic way. One 
additional memory is required for temporarily storing the computed values of alpha. 

Figure 4 illustrates the inventive arrangement of three RAMs 80, 82, 84 in a 

15 decoder such as 72 or 76. A sequence of 6 phases is indicated where a phase indicates 
a processing of a whole block of NDP input metrics. 

In the figure it can be seen that the B operation is performed twice for each 
block: the first time beta values are updated for the first NDP steps, but are not used for 
the SISO output calculation; the second time the beta updating is continued for the 

20 NDP following steps and NDP SISO outputs are evaluated at the same time. 

1 In phase 1, the first NDP long block of branch metrics is stored in RAM 

80 (SI). 

2 In phase 2, the beta value is calculated on the first block (B1INTT), the 
first block is moved to the second RAM SO, and a second block of branch metrics is 

25 stored in the first RAM SO. 

It will be apparent that read-modify-write access is required for the RAMs SO, 

82, 84. 

3 In phase 3 the alpha values are calculated on the first block (Al) and are 
stored in a separate memory for subsequent use; the first block is moved to the third 

30 RAM S4; the beta values are calculated on the second block (B2INTT) and the second 
block is moved to the second RAM S2; a third block of branch metrics is stored in the 
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first RAM SO (S3). 

4 In phase 4 the RAM S4 content is read in the reverse order for 
sequentially calculating the current values of beta (Bl) and the associated output 
probabilities (PI); this calculation also makes use of the alpha values stored in the 

5 separate RAM, which can be re-used for the writing of a new alpha evaluated in the 
second block (A2) at the same time, the initialization is performed on the third block 
(B3 INIT); the second block S2 is discarded; the third block S3 is moved to the third 
memory 84; and a new block S4 is read into the first RAM 80. 

5 In phase 5 the values B2 and P2 are calculated for the second block in 
10 the third RAM 84, a value of A3 for the third block is calculated in the second memory 

82, and a value B4 INIT is calculated in the first memory SO; the block S3 is discarded; 
block S4 is moved to the third memory 84 and block S5 is moved to a second memory 
82 while a third block is introduced. 

6 In phase 6 values of B3 and P3 are calculated in the third RAM 84, so 
1 5 that all required values are available for the first three blocks. As before a new block of 

data is stored in the first memory 80, and other blocks are shifted along one memory 
then discarded. 

Figure 5 illustrates the architecture of a decoder with three RAMs 80, 82, 84 
each having a single input and read-modify-write access 81, 83, 85. Each RAM is 
20 connected to an associated Add Compare Select (ACS) unit 86, 88, 90 operating 
respectively on the stages Bint; A and B as defined above, with respective input circuits 
92. 94, 96. Input 92 receives three signals; input 94 receives two signals and input 96 
receives two signals a nd the output from the ACS 86. 

The output from ACS 88 associated with the second memory 82 is supplied to 
25 an additional storage stage 98 which stores temporarily the computed alpha values. 
There is an additional ACS circuit 100, which receives signals from storage 98, from 
ACS 90, and from RAM 84, and provides an output signal. 

(In comparison with the prior art, the ACS sections are allocated instead of an 
NDP long pipelining of ACS stages. This simplification does not affect the 
30 performance of the decoder, as the throughput is limited by the single ACS. 

The modification to the algorithm and the architecture according to the 

6 



WO 00/57560 



PCT/C B00/0 1068 



invention provide a substantial cost reduction. 

It will be appreciated that the SISO outputs are obtained in reverse order so 
that, to obtain the correct order in the output signal, either the interleaving law of the 
decoder (74 or 78 in Figure 3b) must be changed, or the decoding stage can be 
5 arranged to operate on a Last In First Out principle. 

Inspection of Figure 5 will show that the input probability PI(u;I) and PI(c;I) are 
added at each stage at each ACS section 86, 88, 90. 

A variation of an architecture to implement the invention is shown in Figure 6. 
There is an additional ACS stage 102, which precedes the ACS stages 86, 88, 90 and is 
10 arranged to add the input probabilities. The sum is supplied to the first memory SOBy 
use of such an arrangement the number of adders can be halved 

An associated requirement is that the width of each of the memories 80, 82, 84 
must be increased, so that the reduction in silicon area achieved by omission of half the 
address is cancelled at least partly. The actual result in terms of surface area of silicon 
1 5 must be evaluated for each individual case. 

However, from the point of view of performance, the arrangement of Figure 6 
always results in an increased throughput of the SISO decoder: in fact, if the adding of 
input probabilities is performed inside the ACS processor it increases the global ACS 
delay; as the ACS has the outputs connected to the input through a feedback loop, it is 
20 the decoder bottleneck, and moving operations from inside to outside the ACS 
processor immediately gives better performance. 

The embodiment has been described with reference to the Global System for 
Mobile Communications (GSM), but the principles described above apply equally to the 
Universal Mobile Telephone System (UMTS) and to the Enhanced Data Rates for 
25 GSM Evolution (EDGE) system, and to any other mobile telecommunication system 
using any coding system, but especially turbo coding. 
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CLAIMS 

1 A method of decoding concatenated convolutional codes comprising the 
steps of 

storing a block of NDP input branch probabilities in a memory; 
5 from the stored block:- 

updating NDP alpha metrics in a forward direction; 
updating NDP beta metrics in a backward direction; and 
computing NDP values of the output probability distributions; 
characterized in that the two updating steps and the computing step are 
10 performed in parallel. 

2 A method according to Claim I in which for each block the updating of 
the NDP beta metrics is performed in two stages:- 

(a) in a first step the beta values are updated for the first NDP steps; 

(b) in a second step the beta values are updated for the 
1 5 following/subsequent NDP steps and NDP SISO outputs are simultaneously evaluated. 

3 A method according to Claim 1 or Claim 2 in which the computing steps 
are performed in parallel in six phases comprising:- 

in phase i, storing (SI) a first NDP long block of branch metrics in a first 
memory; 

20 in phase 2, performing an initialization (B1IN1T) on said first block in the first 

memory, moving the first block to a second memory, and storing (S2) a second NDP 
long block of branch metrics in the first memory; 

in phase 3, calculating alpha values for the first block (Al) in the second 
. memory, storing the alpha values, and moving the first block to a third memory; 
25 performing an initialization (B2INIT) on said second block in the first memory, moving 
the second block to the second memory, and storing (S3) a third NDP long block of 
branch metrics in the first memory; 

in phase 4, calculating current beta values (Bl) and associated output 
probabilities (PI) for the first block in the third memory, and discarding the first block 
30 (?), calculating alpha values for the second block (A2) in the second memory and 
storing the alpha values, and moving the second block to the third memory; performing 
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an initialization (B3rNIT) on said third block in the first memory, and moving the third 
block to the second memory; 

in phase 5, calculating current beta values (B2) and associated output 
probabilities (P2) for the second block in the third memory and discarding the second 
5 block; calculating alpha values for the third block (A3) in the second memory and 
storing the alpha values, and moving the third block to the third memory; and 

in phase 6, calculating current beta values B3 and associated output probability 
P3 on the third block in the third memory. 

4 A method according to Claim 1, 2 or 3 comprising a further step of 
1 o deriving a decoded signal from at least the first three blocks of code. 

5 A method according to Claim 2or Claim 3 or Claim 4 in which for 
calculation of the current beta values Bl B2 and B3 and associated output probabilities 
P I P2 P3 the contents of the third memory are read in reverse order. 

6 A method according to Claim 5 further comprising the initial step of 
15 storing the input probabilities PI(u;I) and PI(c;I) and supplying the sum to the first 

memory. 

7 A radio telecommunications system in which each data source and data 
receiver (IS, 20, 26, 28, 30, 32) comprises a digital coder/decoder (44, 54) for applying 
concatenated convolutional codes characterised by a decoder comprising three 

20 memories (80, 82, 84) each having a single read-modify-write access 81, S3, 85, the 
memories being arranged to perform in parallel the steps of updating NDP alpha 
metrics in a forward direction; updating NDP beta metrics in a backward direction, and 
computing NDP values of the output probability distributions. 

8 A decoder according to Claim 7 in which for each block of NDP input 
25 branch probabilities the second memory (82) is arranged to calculate the updated alpha 

values from a previous NDP step in which the first memory (80) is arranged to 
calculate the current beta values from a previous NDP step; and the third memory (84) 
is arranged to calculate updated beta values for subsequent NDP steps and to calculate 
the output probability distributions. 
30 9 A decoder according to Claim 7 in which an additional memory (98) is 

connected to the second memory (82) and is arranged to store temporarily values of 
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alpha calculated in the second memory. 

10 A decoder according to any one of Claims 7, 8 or 9 further comprising 
an Add Compare Select means (102) arranged to add the input probabilities and to 
provide the sum to the first memory (80). 
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